Search CORE

10 research outputs found

Pessimistic Software Lock-Elision

Author: A. Adl-Tabatabai
C. Fetzer
D. Dice
H. Attiya
H. Attiya
I. Keidar
J. Mellor-Crummey
M. Kapalka
M. Spear
T. Harris
T. Riegel
T. Shpeisman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Read-write locks are one of the most prevalent lock forms in concurrent applications because they allow read accesses to locked code to proceed in parallel. However, they do not offer any parallelism between reads and writes. This paper introduces pessimistic lock-elision (PLE), a new approach for non-speculatively replacing read-write locks with pessimistic (i.e. non-aborting) software transactional code that allows read-write concurrency even for contended code and even if the code includes system calls. On systems with hardware transactional support, PLE will allow failed transactions, or ones that contain system calls, to preserve read-write concurrency. Our PLE algorithm is based on a novel encounter-order design of a fully pessimistic STM system that in a variety of benchmarks spanning from counters to trees, even when up to 40% of calls are mutating the locked structure, provides up to 5 times the performance of a state-of-the-art read-write lock.National Science Foundation (U.S.) (Grant 1217921

CiteSeerX

DSpace@MIT

Crossref

The Cost of Privatization

Author: B.H. Bloom
C. Cascaval
D. Dice
F.T. Schneider
H. Avni
J.R. Larus
M. Martin
M.F. Spear
T. Riegel
T. Shpeisman
Publication venue
Publication date: 01/01/2010
Field of study

Software transactional memory (STM) guarantees that a transaction, consisting of a sequence of operations on the memory, appears to be executed atomically. In practice, it is important to be able to run transactions together with nontransactional legacy code accessing the same memory locations, by supporting privatization. Privatization should be provided without sacrificing the parallelism offered by today’s multicore systems and multiprocessors. This paper proves an inherent cost for supporting privatization, which is linear in the number of privatized items. Specifically, we show that a transaction privatizing k items must have a data set of size at least k, in an STM with invisible reads, which is oblivious to different non-conflicting executions and guarantees progress in such executions. When reads are visible, it is shown that Ω(k) memory locations must be accessed by a privatizing transaction, where k is the minimum between the number of privatized items and the number of concurrent transactions guaranteed to make progresss, thus capturing the tradeoff between the cost of privatization and the parallelism offered by the STM

CiteSeerX

Crossref

Dynamic optimization for efficient strong atomicity

Author: Ali-Reza Adl-Tabatabai
BORISOV N.
Florian T. Schneider
STANDARD
Tatiana Shpeisman
Vijay Menon
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Efficient mapping of irregular c++ applications to integrated gpus

Author: Ali-Reza Adl-Tabatabai
Brian T Lewis
Chunling Hu
Deepak Majeti
Rajkishore Barik
Rashid Kaleem
Tatiana Shpeisman
Yang Ni
Publication venue
Publication date: 01/01/2014
Field of study

ABSTRACT There is growing interest in using GPUs to accelerate generalpurpose computation since they offer the potential of massive parallelism with reduced energy consumption. This interest has been encouraged by the ubiquity of integrated processors that combine a GPU and CPU on the same die, lowering the cost of offloading work to the GPU. However, while the majority of effort has focused on GPU acceleration of regular applications, relatively little is known about the behavior of irregular applications on GPUs. These applications are expected to perform poorly on GPUs without major software engineering effort. We present a compiler framework with support for C++ features that enables G-PU acceleration of a wide range of C++ applications with minimal changes. This framework, Concord, includes a lowcost, software SVM implementation that permits seamless sharing of pointer-containing data structures between the CPU and GPU. It also includes compiler optimizations to improve irregular application performance on GPUs. Using Concord, we ran nine irregular C++ programs on two computer systems containing Intel 4 th Generation Core processors. One system is an Ultrabook with an integrated HD Graphics 5000 GPU, and the other system is a desktop with an integrated HD Graphics 4600 GPU. The nine applications are pointer-intensive and operate on irregular data structures such as trees and graphs; they include face detection, BTree, single-source shortest path, soft-body physics simulation, and breadth-first search. Our results show that Concord acceleration using the GPU improves energy efficiency by up to 6.04× on the Ultrabook and 3.52× on th

CiteSeerX

SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations

Author: E. Demmel
I. S. Duff
J. R. Gilbert
J. R. Gilbert
T. Shpeisman
Vladimir Kotlyar
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Abstract Compiler and Runtime Support for Efficient Software Transactional Memory

Author: Ali-reza Adl-tabatabai
Bratin Saha
Brian R. Murphy
Brian T. Lewis
Tatiana Shpeisman
Vijay Menon
Publication venue
Publication date
Field of study

Programmers have traditionally used locks to synchronize concurrent access to shared data. Lock-based synchronization, however, has well-known pitfalls: using locks for fine-grain synchronization and composing code that already uses locks are both difficult and prone to deadlock. Transactional memory provides an alternate concurrency control mechanism that avoids these pitfalls and significantly eases concurrent programming. Transactional memory language constructs have recently been proposed as extensions to existing languages or included in new concurrent language specifications, opening the door for new compiler optimizations that target the overheads of transactional memory. This paper presents compiler and runtime optimizations for transactional memory language constructs. We present a highperformance software transactional memory system (STM) integrated into a managed runtime environment. Our system efficiently implements nested transactions that support both composition of transactions and partial roll back. Our JIT compiler is the first to optimize the overheads of STM, and we show novel techniques for enabling JIT optimizations on STM operations. We measure the performance of our optimizations on a 16-way SMP running multi-threaded transactional workloads. Our results show that these techniques enable transactional memory’s performance to compete with that of well-tuned synchronization

CiteSeerX